JMIR Public Health and Surveillance — Latest Matching Preprints

1

Analytical Centralization of Health Expenditure at the National Administrator of Health System Resources: Architecture, Data Quality, and Operational Performance of the ADRES Health System Analytics Platform, Colombia

Garavito Jimenez, D. A.; Bello Angulo, D. E.; Mejia Lemus, L. T.; Chipatecua, D.; Fula, D. D.; Perez-Rubiano, S.; Martinez, F. L.; Bohorquez Pinzon, J. C.

2026-06-10 public and global health 10.64898/2026.06.08.26355159 medRxiv

Top 0.1%

8.2%

Show abstract

Between 2024 and 2025, Colombia universalized the Electronic Health Invoice with embedded Individual Health Services Delivery Records (RIPS -- Registro Background Between 2024 and 2025, Colombia universalized the Electronic Health Invoice with embedded RIPS records (FEV-RIPS) as the standard for financial and clinical data exchange. ADRES -- the entity responsible for administering the resources of Colombia's General Social Security Health System -- faced the challenge of processing information from multiple heterogeneous sources generated by more than 55,000 healthcare providers. Health systems in high-income countries converge clinical-financial data in consolidated platforms; Colombia started from a fragmented architecture with incompatible historical sources, no cross-database standardization, and no centralized analytical infrastructure until 2023. Objective We describe the design, technical challenges of integrating heterogeneous data, and operational performance of the analytical infrastructure built by ADRES to centralize large-scale processing of Colombian health system information, and derive transferable lessons for health system resource administrators in Latin America facing equivalent digitalization mandates. Methods Technical-descriptive report based on operational metrics from the ADRES Azure/Databricks environment during January-November 2025. We report indicators of data volume, processing speed, computational capacity, concurrent use by functional group, and governance structure. The architecture integrates VPN connectivity with MinSalud, automated processing of multiple formats (XML, relational tables, flat files), and a medallion data lake (Bronze/Silver/Gold). Data quality challenges include structural inconsistencies across sources, coding incompatibilities (municipalities, dates, diagnoses), format heterogeneities in unstructured data, and absent technical documentation. Results The platform manages 21 catalogs, 1,183 tables, and over 110,645 million stored records, with cumulative production exceeding 1 trillion processed records. It executes queries on 100 billion records in ten seconds using clusters of up to 32 TB RAM and 4,096 vCPU. During September-October 2025, monthly query peaks reached 78,028 across eleven functional groups. Integration required Python/PySpark parsers for variable-depth XML, equivalence tables for incompatible municipality codes, cleaning routines for extreme dates used as nulls (1900-01-01, 9999-12-31), and transformation logic bridging classic RIPS and FEV-RIPS. The platform supported econometric analyses, judicial mandate responses, and public interactive dashboards. Conversational AI integration (Genie, Copilot) extends analytical access to users without SQL knowledge. Conclusions ADRES built in one year an analytical infrastructure that provides, to our knowledge, the first published documentation of the systemic technical challenges of integrating heterogeneous data sources in a middle-income social security health system. Centralizing health system information at national scale is technically feasible under public institutional constraints -- but requires solving cross-source standardization problems the implementation literature does not document with quantitative precision. The derived lessons are transferable to health system resource administrators in Latin America facing equivalent challenges.

2

End of Average. Understanding Overweight & Obesity: Rationale and Design.

Vanbrabant, E.; Roefs, A.; Goossens, G.; Lemmens, L.; Shapovalova, Y.; Hesen, J.; Mironiuc, C.

2026-06-08 primary care research 10.64898/2026.06.05.26354975 medRxiv

Top 0.4%

4.3%

Show abstract

Background: Obesity is globally recognized as a complex, multifactorial chronic disease, with biological, psychological, environmental and behavioural factors involved in both disease pathogenesis and maintenance. Although previous group-based studies demonstrated involvement of each of these factors, there is large inter-individual variability in the factors contributing to disease development as well as intervention outcomes, causing limited translatability to the individual level. This heterogeneity in treatment effectiveness might be due to differential causal and maintenance factors of obesity. To enable the transition from a one-size-fits-all approach to a more personalized approach for individuals with overweight or obesity, this study aims to investigate if and how the degree of weight loss and changes in daily life behaviour after a combined lifestyle intervention depend on individual baseline profiles comprising of person characteristics, biological, psychological, environmental and behavioural factors. Methods: This study will include 600 individuals varying in BMI, 200 participants with a healthy BMI (18.5-24.9kg/m2), 200 with overweight (BMI 25.0-29.9kg/m2), and 200 with obesity (BMI [≥]30.0kg/m2). For all participants, a comprehensive individual baseline profile is created, including person characteristics, biological, psychological, environmental and behavioural factors. A clustering method is applied to identify clusters of participants with similar characteristics. Next, we examine if and how these clusters are linked to bodyweight indicators measured at baseline, and how they relate to daily lifestyle behaviour, as measured by ecological momentary assessment (EMA) using a smartphone app and sensor technology (3-week measurements). Individuals with overweight or obesity will be randomized to the intensive lifestyle intervention or a lifestyle information condition, to determine if treatment response can be predicted based on cluster characteristics, how daily lifestyle behaviour changes after an intervention, and how changes in daily lifestyle behaviour relate to treatment response. Discussion: The End of Average study aims to characterize a large set of individuals varying in body weight to predict intervention effectiveness measured as changes in body weight indicators and in daily lifestyle behaviours. If reliable predictors of treatment success can be identified, these can be applied in personalized lifestyle interventions to improve lifestyle behaviour, body weight management and overall health.

3

Prescription intervals of medications for chronic use: a cohort study

Muddiman, R.; Donoghue, P.; Gomez Lemus, J.; Doherty, A. S.; Boland, F.; McCarthy, C.; Moriarty, F.

2026-06-09 primary care research 10.64898/2026.06.08.26355164 medRxiv

Top 0.4%

4.2%

Show abstract

Purpose In deprescribing studies, a prescription-free gap is typically used to determine if patients discontinued their treatment. An appropriate gap depends on the typical time between prescriptions during continued use. This work aims to characterise the interval between prescriptions of chronic drugs using different methods for a cohort of older people in primary care in Ireland. Methods The empirical prescription interval was analysed for 38,154 patients for the twenty most common drug classes and the association between covariates and the interval was analysed using a multi-level model. Estimates were also compared to those obtained from the parametric waiting time distribution (pWTD) approach. Results Available covariates had consistent relationships with prescription intervals across drug classes. For example, each additional prescription issue was associated with an increase in the interval by 5.0 (NSAIDs) to 19.7 days ("Other antidepressants"). Full public health cover was associated with a -29.0 day (inhaled adrenergics) to -11.0 day (opioids) change relative to partial cover, while other/private cover had a -17.9 day (benzodiazepines and associated drugs) to -7.1 day (SSRI and SNRIs) change relative to partial cover. The pWTD also produced consistent estimates of the population interval for most drugs. Conclusions The interval varied substantially within drug classes, due to a mixture of patient, practice and unmodelled factors. Variation between practices was effectively explained, with residual variation between patients and within patients. The pWTD approach is useful for describing complex distributions of intervals, and may be more appropriate for inferring a gap than summarising truncated data.

4

Combining centralized and decentralized approaches to assess and ensure data quality in Eurocrine(R) via Microsoft Power BI and DataquieR

Musholt, T. J.; Clerici, T.; Bergenfelz, A.; Schmidt, C. O.; Struckmann, S.

2026-06-05 health informatics 10.64898/2026.06.04.26354884 medRxiv

Top 0.5%

3.9%

Show abstract

Background: Medical registries have gained importance in the evaluation of healthcare quality outcomes. In the absence of high-quality evidence, such as randomized controlled trials, studies based on registry data are essential for informing clinical guidelines. Methods for assessing data quality are rarely described in detail. To ensure the credibility of registry-based studies, registries must use all available technical and operational means to guarantee high data quality. Method: Eurocrine(R) is a pan-European endocrine surgical database and quality registry initially funded by the EU healthcare programme, which started in 2015 and now includes more than 200,000 interventions as of April 2025. To ensure high data quality, interactive and standardized reports are created via Microsoft Power BI, which are created both centrally and locally. In addition, comprehensive data quality analyses were performed via the R-based package dataquieR. Results: Although a multitude of technical measures (for example, input screen design and real-time plausibility checks during data entry) are in place, they are not sufficient to prevent human errors at data entry. Errors identified in the reports were corrected, and preventive measures were implemented. Overall, the data quality was assessed as very good in terms of completeness, accuracy, and consistency. Conclusion: It is very important to provide registry users with an efficient and smart tool to identify data issues, as they have the clinical information to correct them. Data quality reports generated with dataquieR represent an effective tool for registry administrators. Predesigned Microsoft Power BI reports enable participating Eurocrine(R) clinics to self-audit their data.

5

Integrating a Non-Communicable Disease Care Cascade within Ghana's Community-Based Health Planning and Services (CHPS) Program: the COMBINE Pilot Implementation Trial

Heller, D. J.; Elkersh, Y.; Nonterah, E. A.; Kuwolamo, I.; Horowitz, C. R.; Alvarez, E. E.; Awine, T.; Govindarajulu, U.; Squires, A. P.; Aborigo, R. A.

2026-06-05 primary care research 10.64898/2026.06.03.26354834 medRxiv

Top 0.6%

3.6%

Show abstract

Introduction: Hypertension is the world's leading cause of death, and depression its leading cause of disability. Control rates for these noncommunicable diseases (NCDs) are low in low and middle-income countries (LMICs). Many LMICs have programs to screen and treat underserved communities for infectious diseases, but evidence to adapt them to treat NCDs is limited. We developed and tested a non-communicable disease program through Ghana's Community-Based Health Planning and Services (CHPS) primary care initiative. Methods: We trained 8 CHPS nurses to diagnose and treat hypertension and depression through door-to-door screening and pharmacotherapy. Physician assistants provided telehealth supervision. We combined this treatment with volunteer counseling to boost medication adherence, improve mood, and change health behaviors. We called the 90-day intervention the CHPS Opportunity for Mentally and Behaviorally Integrated NCD Engagement (COMBINE). Results: We recruited 60 adults from 580 screened: 37 with hypertension (mean blood pressure (BP) of 149/91 mm Hg) and 23 with depression (mean physician health questionnaire (PHQ-9) score of 13.3). After 90 days, 57/60 (95%) completed the intervention: 32/37 (86%) achieved blood pressure control (mean BP 122/75 mm Hg), and 19 of 20 (95%) achieved depression control (mean PHQ-9 score 2.0). After 12 months, 51/60 were retained: 33/37 with hypertension (89%) and 18/23 with depression (78%), with a mean BP of 121/75 and PHQ-9 score of 1.4 respectively. All 51 (100%) achieved disease control at 12 months. 5 persons left by migration and 4 by escalation to higher-level care. Conclusions: The COMBINE model achieved high levels of diagnosis, care retention, and disease control, with minimal adverse events, in a remote setting with limited usual NCD care. This model suggests a novel means to improve the care cascade for these and other noncommunicable diseases through existing non-physician care models in LMICs, warranting further controlled testing at scale.

6

Estimating Infectious Disease Importation Risk during the 2026 FIFA World Cup

Herrera-Diestra, J. L.; Bi, K.; Ptak, S.; Ertem, Z.; Al-amery, A.; Harris, M.; Meyers, L. A.

2026-06-04 public and global health 10.64898/2026.06.03.26354828 medRxiv

Top 0.7%

3.5%

Show abstract

Background. The 2026 FIFA World Cup will bring an estimated 1--5~million international visitors to 11~US host cities between June~11 and July~19, 2026---the largest tournament in history. Large-scale international gatherings accelerate importation of infectious diseases from diverse source populations. Advance estimation of importation risk is essential for public health preparedness and surveillance prioritization. Methods. We developed a Poisson importation framework applied to five diseases (dengue fever, influenza, malaria, measles, and pertussis) across the 11~US venue cities. Three nested travel models of increasing resolution were constructed: a baseline model using routine June~2024 arrival data; a World Cup--adjusted model incorporating projected visitor growth factors; and a schedule-driven model routing WC fans to specific cities based on match assignments. WHO incidence and BTS T-100 routing fractions were combined with Monte Carlo uncertainty propagation (5,000 Uniform draws on under-reporting and travel-while-infectious parameters) to yield median importation estimates with 95\% uncertainty intervals. Results. Dengue posed the highest importation risk at most venue cities under the schedule-driven model (median $\Lambda > 10$ expected importations from Brazil alone; 95\% uncertainty interval 5.9--33.1), robust across the full literature-supported parameter range; Atlanta was the exception, where malaria probability exceeded dengue, driven by direct travel from West and Central African nations. Influenza ranked second at most cities, coinciding with the Southern Hemisphere winter peak. Pertussis showed broad geographic spread but carries the widest relative uncertainty, as the assumed detection rate sits at the upper bound of the literature range. Background tourism accounted for the dominant share of total importation risk; the World Cup fan increment contributed approximately 8.3\% of projected arrivals for WC-qualified nations. Conclusions. This Poisson importation framework, built entirely from publicly available data, provides reproducible importation risk estimates for mass gathering events. The framework extends to additional diseases, cities, and gatherings, offering a transparent baseline complementary to proprietary modeling systems.

7

Alcohol Consumption Patterns and Sociodemographic Correlates Among US Adults with Cardiovascular Disease: A Cross-Sectional Analysis of All of Us and NHANES

yang, q.; yu, j.; zhao, h.; zou, m.; sun, y.

2026-06-09 public and global health 10.64898/2026.06.06.26355052 medRxiv

Top 1%

1.9%

Show abstract

This cross-sectional study aimed to examine the prevalence of alcohol use and its sociodemographic correlates among adults with cardiovascular disease (CVD). We analyzed data from two large US cohorts: the All of Us Research Program (2017-2023) and the National Health and Nutrition Examination Survey (NHANES, 1999-2016). Both CVD diagnosis and past-year alcohol consumption were self-reported. Risky drinking was defined as exceeding moderate drinking or binge drinking (All of Us), or moderate/heavy drinking (NHANES). Multivariable logistic regression was used to exam associations with sociodemographic and lifestyle factors. Among 32,788 current drinkers with CVD in the All of Us cohort, 15% exceeded moderate drinking thresholds and 26% reported binge drinking. Older age, female sex, and higher socioeconomic status were inversely associated with risky drinking, while smoking was positively associated. In NHANES, moderate drinking rose from 47.3% to 57.2% and heavy drinking from 6.7% to 7.2%. Moderate/heavy drinking was positively associated with age <65 but inversely with age [≥]65. Higher education and income were linked to moderate drinking, while current smoking was strongly associated with heavy drinking. These results highlight the need to integrate holistic screening for alcohol use, tobacco use, and social context into routine cardiovascular care.

8

Perceived Social Support and Self-Efficacy as Mediators Between Health Literacy and Quality of Life Among Middle-Aged and Older Adults with Hypertension: A Cross-Sectional Study in Six Central Provinces of China

Zhao, Y.; Yun, Y.; Bai, T.; Xiong, L.; Ruan, Y.; Zhao, H.; Wang, W.; Wang, F.

2026-06-08 public and global health 10.64898/2026.06.06.26355051 medRxiv

Top 1%

1.9%

Show abstract

Abstract Objective: The onset of hypertension occurs at a younger age in China, and the relationship between health literacy and quality of life among middle-aged and older hypertensive patients remains unclear. This study explored whether perceived social support and self-efficacy mediate the association between health literacy and quality of life in middle-aged and older hypertensive patients. Methods: A questionnaire was administered to 1,015 middle-aged and older hypertensive adults from communities in six central provinces of China. The EQ-5D scale, Perceived Social Support (PSS) scale, Self-Efficacy Scale (SES), and Health Literacy Scale (HLS) were used to assess quality of life, social support, self-efficacy, and health literacy, respectively. Mplus 8.3 software was used to construct a structural equation model for path analysis. Results: The mean PSS, SES, HLS, EQ-5D, and EQ-VAS scores were 15.57{+/-}3.45, 10.61{+/-}2.41, 9.49{+/-}2.86, 0.88{+/-}0.18, and 71.06{+/-}17.49, respectively. Health literacy and quality of life scores significantly differed among middle-aged and older hypertensive patients, and both showed positive correlations with perceived social support and self-efficacy (both P<0.001). Perceived social support and self-efficacy exhibited a chain mediated effect on the relationship between health literacy and quality of life (EQ-5D utility index and EQ-VAS), accounting for 28.57% of the total effect of the EQ-5D utility index and 27.26% of that of the EQ-VAS. This study is the first to elucidate the mechanism by which health literacy influences quality of life in middle-aged and older hypertensive patients through the chain-mediated effect of perceived social support and self-efficacy. Conclusion : Health literacy is significantly correlated with quality of life in middle-aged and older hypertensive patients. This correlation can directly or indirectly explain the impact on quality of life through mediating pathways involving perceived social support and self-efficacy. Keywords: hypertensive patients, perceived social support, self-efficacy, health literacy, quality of life, mediating effect

9

Knowledge, attitudes and practices regarding risk factors for cardiovascular disease among women in an urban slum of Kathmandu, Nepal: A cross-sectional study.

Kasaju, M.; Shrestha, A. P.; Oli, N.; Vaidya, A.

2026-06-08 public and global health 10.64898/2026.06.04.26354909 medRxiv

Top 2%

1.7%

Show abstract

Introduction: Cardiovascular diseases (CVDs) are the leading cause for death and disability worldwide accounting for 75% of deaths in low- and middle-income countries (LMICs) like Nepal. Urbanization and globalization remains the major cause of rise in CVDs among urban poor population along with growth in slum settlements. This study aims to assess the knowledge, attitude and practice (KAP) of CVDs and its risk factors among women of one such urban poor community in Nepal. Methodology: This cross-sectional study (n=388) in the Sinamangal-Minbhawan slum area was conducted using semi structured questionnaire based on STEPs survey and HARDIC study among the participants selected through convenient sampling. Descriptive analysis was done using SPSS version 21 and KAP scores were further categorized based on median score to perform multivariate logistic analysis. Additionally, Anthropometric and blood pressure measurements were also recorded and analyzed. Results: The median age (Interquartile range) of participants was 33 years (17) with majority of them being Dalit by ethnicity, housewives, with up to primary level education belonging to upper lower socioeconomic class. More than half (53.3%) of the participants were obese and over 23% were hypertensive. While half of the hypertensive women were aware of their status, only 3% had their blood pressure under control.The median knowledge, attitude and practice (KAP) scores were 12, 60 and 10 respectively. The KAP scores were positively associated with socioeconomic status of the participants. Conclusion: The study revealed low knowledge with high prevalence of behavioral risk factors of CVDs along with high prevalence of other metabolic risk factors like high body mass index, high waist hip ratio and hypertension among women of slum area with a positive attitude to prevent CVDs and its risk factors.

10

Modeling the Impact of Pediatric RSV Immunization in Massachusetts, 2024--2025

Jones, L.; Ergas, R.; Tibbs, A.; Russo, E. T.; Norville, J.; Bingay, B.; Brown, C. M.; Reich, N. G.; Pasco, R.

2026-06-10 epidemiology 10.64898/2026.06.05.26354236 medRxiv

Top 2%

1.7%

Show abstract

Background Pediatric immunizations for Respiratory Syncytial Virus (RSV), including monoclonal antibodies for infants and vaccines for pregnant people, have become broadly available and can prevent severe RSV outcomes in infants. However, quantifying the impact of RSV immunization in prevention of severe pediatric illness at the population-level is limited by lack of RSV case surveillance data. The Massachusetts Department of Public Health (DPH) conducted a modeling analysis using routine public health surveillance data to estimate the state-level impact of new RSV immunization products on Emergency Department (ED) visits and hospitalizations in Massachusetts for highest risk pediatric groups. Methods A scenario projection tool, called R.Scenario.Vax, was utilized to simulate RSV-associated ED hospital encounters by age group in the context of newly available immunizations. ED visit and hospitalization data from the National Syndromic Surveillance Program (NSSP) during the time period 10/08/2017--10/19/2024 were analyzed, scaled to account for changes in RSV testing practices over time and missing encounter volume in historic data, and utilized to inform model fit of a "typical" RSV season. RSV immunization data from the Massachusetts Immunization Information System (MIIS) for the 2023--2024 and 2024--2025 RSV seasons informed high and moderate pediatric RSV immunization coverage scenarios and their impact was compared to a counterfactual reference scenario of no new immunizations. Median projections were quantitatively and qualitatively compared to observed 2024--2025 season data. Percent reduction in hospital encounters and encounters averted per 10,000 population were calculated for each scenario as compared to the reference. Results Projections for the youngest at-risk age groups showed significantly lower RSV-associated ED visits and hospitalizations during the 2024--2025 season for both high and moderate immunization coverage scenarios. Median projections for infants under 6 months old in the highest coverage scenario, wherein nearly all infants were immunized, showed 72.6% lower ED visits and 73.4% lower hospitalizations when compared to the reference scenario, equating to 262 ED visits and 85 hospitalizations averted per 10,000 population. Conclusions Our results support the use of modeling methods for public health insights and suggest that RSV immunizations for infant populations result in significantly lower RSV-related ED encounters in Massachusetts.

11

Early assessment of potential airline-mediated importation risk during the 2026 DRC-Uganda Bundibugyo virus disease outbreak

Kinoshita, R.; Suzuki, M.; Yoneoka, D.

2026-06-09 public and global health 10.64898/2026.06.01.26354569 medRxiv

Top 2%

1.7%

Show abstract

During the 2026 Bundibugyo virus disease outbreak in the Democratic Republic of the Congo and Uganda, we projected potential airline-mediated importation risk using contemporary airline network and an externally calibrated Ebola importation hazard. Effective-distance analyses identified major international hub countries, including Belgium, France, South Africa, Kenya, and the United Arab Emirates, as higher-probability gateways within 30 days. These early projections provide a reproducible framework for real-time international situational awareness, while emphasizing that importation risk does not imply local transmission risk.

12

Performance evaluation and benchmarking across 16 large language models on a comprehensive real-world emergency department triage data set

Benning, L.; Hirsch, A.; Groeschel, M.; Roeschl, T.; Spott, M.; Hans, F. P.; Urban, T.; Busch, H.-J.; Meyer, A.; Madrid, J.

2026-06-05 health informatics 10.64898/2026.05.28.26353935 medRxiv

Top 2%

1.6%

Show abstract

Background Emergency department (ED) triage is a high-stakes clinical decision process that determines patient prioritization and resource allocation under time pressure. Large language models (LLMs) have recently been proposed as decision-support tools for triage, yet most evaluations rely on simulated scenarios or curated datasets. Evidence from real-world clinical environments remains limited. The objective of this project was to systematically evaluate the performance, calibration, and reproducibility of multiple contemporary large language models for Emergency Severity Index (ESI) classification and sectoral allocation (ED vs. urgent care practice, UCP) using a comprehensive real-world triage dataset. Material and Methods Retrospective cross-sectional benchmarking study conducted at a tertiary academic emergency ED in Germany with an integrated central point of assessment (CPA). The study included all consecutive adult walk-in encounters (>18 years) presenting between October 2023 and February 2024 (N = 16,107). Data were collected from a structured clinical decision support system capturing presenting complaints, vital signs, and triage decisions recorded by specialized nursing staff. Structured clinical variables routinely collected at triage, including presenting complaint categories (CEDIS-PCL), vital signs according to the ABCDE framework, and additional structured or free-text clinical information. Results The primary outcome was the agreement between LLM-predicted and nurse-assigned ESI levels measured using quadratic-weighted Cohen's k. Secondary outcomes included sectoral assignment agreement, misclassification patterns (over- and under-triage), calibration metrics, and output reproducibility. Quadratic-weighted k values ranged from 0.18 to 0.75 across models. Only a structured stepwise prompting strategy achieved substantial agreement (k_qw = 0.747), approaching reported human inter-rater reliability. Most models demonstrated moderate or lower agreement and systematic overconfidence, with expected calibration errors (ECE) based on verbalized confidence ranging from 0.099 to 0.355. Sectoral assignment agreement (i.e. ED vs. urgent care practice, UCP) was uniformly low (k < 0.30). Reproducibility testing revealed substantial variability in 23% of cases, indicating non-deterministic output behavior for clinically relevant decisions. Conclusions Current large language models demonstrate heterogeneous and generally limited performance in real-world emergency triage tasks. Structured algorithm-guided prompting appears more influential than model architecture or size. Before clinical implementation, improvements in calibration, reliability, and workflow integration are required, alongside regulatory-compliant validation in prospective clinical settings.

13

Local Influenza Forecasts Outperform State-Level Forecasts in the United States

Kim, D.; Pasco, R.; Johnson, K. E.; Fox, S. J.; Reich, N. G.; Meyers, L. A.

2026-06-08 infectious diseases 10.64898/2026.06.04.26354836 medRxiv

Top 2%

1.5%

Show abstract

Accurate outbreak forecasts are critical for timely and effective public health response. In the United States, however, most forecasts are produced at the state level, which can mask substantial sub-state heterogeneity and limit their utility for local planning. We generated and evaluated forecasts of the percentage of Emergency Department visits attributable to influenza across 173 large metropolitan Health Service Areas (HSAs) using a gradient boosting quantile regression (GBQR) model, and compared their accuracy to forecasts derived from state-level data alone. At a one-week, two-week and three-week horizon, local forecasts outperformed state-based forecasts in 98.8%, 90.8%, and 78.6% of HSAs, respectively, achieving mean weighted interval scores that were on average a 39.2% lower (95% range: 5.9% to 76.7%), 19.6% lower (-6.3% to 59.5%) , and 11.4% lower (-11.7% to 44.9%), respectively. The performance advantage of local forecasting was strongest in HSAs representing a smaller share of their state's population and increased with the proportion of the HSA population living in urban areas and the number of metropolitan areas within a state. These results, based on an analysis of HSAs with populations greater than 250,000, demonstrate that fine-scale modeling can substantially improve forecast accuracy and highlight the potential value of local forecasts for outbreak preparedness and response.

14

A Three-Tier Operational Benchmark for Evaluating Large Language Models on Hospital Medication Safety

Proulx, J.; Daines, B.; Barton, M.; Leonard, M. E.; Garcia, J. A.; Young, B.; Snell, Q.; West, T. W.; Watson, S. R.; AlQaseer, M.; Louiset, M.; Maqsood, M. B.; Voutt-Goos, M. J.; Douma, C.; Kasbekar, N.; Jeffries, J.; Abu-Rahmeh, W.; Frush, K.; Grewal, D. K.; Bahsoun, M.; Leonard, M.; Frankel, A.; Classen, D. C.; Pestotnik, S. L.

2026-06-10 health informatics 10.64898/2026.06.05.26354271 medRxiv

Top 2%

1.5%

Show abstract

Objective. To introduce PsiBench, a clinically validated medication-safety benchmark for evaluating large language models (LLMs) against the standards used to certify hospital computerized provider order entry (CPOE) and electronic health record (EHR) systems, and a non-overlapping three-tier evaluation framework separating highest-stakes discrimination, the operational CDS regime, and category-correct alerting. Materials and Methods. PsiBench comprises 492 medication-safety scenarios across 11 safety categories, created by clinical pharmacology experts whose work underpins an annualized testing procedure used by more than 2,000 U.S. hospitals. The three-tier framework partitions the scenarios non-overlappingly: Discrimination (98 scenarios, 50 fatal vs 48 deception, near-balanced 51%/49%); Operational (394 scenarios, 261 serious unsafe plus 133 safe including 41 Excessive Alerts reclassified as operational negatives); and Attribution (311 alert-required scenarios). We evaluated 40 frontier LLMs from 10 providers over 3 runs per scenario at temperature 0.2 (or the provider default where temperature is not configurable), yielding 59,040 evaluations conducted April 21-23, 2026. Results. Headline binary performance on the full benchmark spans a wide range across the 40 models: F1 78.5%-92.3%, accuracy 65.4%-89.8%, sensitivity 81.4%-100.0%, specificity 6.1%-81.8%. Leading models by F1 (o4-mini 92.3%; o3 92.2%) pair high sensitivity with meaningful specificity; three models saturate sensitivity at 100% but fall below 25% specificity, indistinguishable from a naive always-alert classifier. The wide spread on a single headline metric motivates tier-specific analyses, developed in a separate clinical paper. Discussion and Conclusion. PsiBench and the three-tier framework operationalize a rigorous evaluation rubric for LLM medication safety, grounded in two decades of national hospital audit experience. The framework generalizes to any binary medication-safety classifier (rule-based, conventional ML, or LLM-driven), supporting tier-aware model selection and post-deployment surveillance.

15

Investigation of the continuous spread of SARS-CoV-2 in the post pandemic time - Insights into the reason for the sustained spread despite the establishment of population immunity

Yi, B.

2026-06-08 epidemiology 10.64898/2026.06.05.26355009 medRxiv

Top 2%

1.5%

Show abstract

In spite of well-established global immune landscape, SARS-CoV-2 is still able to further spread and continue causing infection waves. The current understanding about the reason behind is limited, and it is still difficult to predict the evolution or spreading tread of SARS-CoV-2. Therefore, it is necessary to investigate whether the establishment of population immunity has changed the virus evolution or spreading pattern. In this investigation, one overall analysis of the SARS-CoV-2 spreading in the past several years have been carried out through one thorough genomic epidemiology study, with Germany being chosen as one representative location in view of the systemic efforts for genomic surveillance. The growth advantage of a few predominant variants in its early spreading period has been evaluated through a logistic regression model. The results have revealed that the major circulating SARS-CoV-2 variants since 2023 are mainly derived from the Omicron BA.2 family. Since middle of 2024, most predominant variants were produced primarily through recombination, indicating that the evolution derived from recombination might be the major driving force for the continuous spread of SARS-CoV-2 despite the existence of population immunity. Furthermore, the lower growth advantage of recently emerged variants might possibly lead to a tread of reduction in the frequency of infection wave. The information revealed from this investigation suggests that although short-term spreading tread can be affected by specific virus feature as well as local immunity landscape, the long-term spreading tread is mainly decided by the genomic diversity of the viruses, and can be predicted through phylogenetic and genomic epidemiology investigation. The results have emphasized the importance of maintaining the efforts for genomic surveillance of SARS-CoV-2, which is essential from both medical and research perspectives.

16

When Algorithms Prescribe: A Cross-Sectional Study of Quality, Misinformation, and Engagement in Statin-Related Content on TikTok

Gharibyan, I.; Ahner, E.; Shao, R.; Sharma, D.; Navarsartian Tazehkand, T.; Diep, J.; Assoumou, B.

2026-06-08 health informatics 10.64898/2026.06.04.26354962 medRxiv

Top 2%

1.4%

Show abstract

Background: Statins are key to preventing atherosclerotic cardiovascular disease and lowering low-density lipoprotein cholesterol and cardiovascular events. However, skepticism regarding their safety and value persists and is increasingly influenced by social media. TikTok has emerged as a major source of health information, but its content varies in quality and accuracy. This study evaluated the quality, attitudes, misinformation, and engagement of statin-related content on TikTok. Methods: Public TikTok videos were collected using predefined search terms and coded by creator type, thematic content, and overall attitude. Video quality was assessed using the DISCERN instrument, the Patient Education Materials Assessment Tool for Audiovisual Materials, and the Global Quality Score. False or misleading claims were independently reviewed by two cardiology fellows. Associations between engagement and quality were also examined. Results: Of 1,349 screened videos, 258 met inclusion criteria. Most were educational (91.0%), with non-physician healthcare providers (34.5%) as the largest creator group. Risks or negative effects were discussed more often than benefits (63.2% vs 42.2%), and 39.5% contained at least one false or misleading claim, most often from complementary and alternative medicine providers and wellness promoters. Quality differed by creator type across all instruments, with physician-created content scoring highest. Video popularity showed minimal association with informational quality. Conclusion: Statin-related TikTok content frequently emphasizes harms, often contains misinformation, and varies substantially in quality by creator type. Greater involvement of healthcare professionals on social media may help improve digital health literacy and counter misleading information about statin therapy.

17

Technology acceptance of machine learning in life sciences: the role of hype perception and journal impact factor.

Serrano, A. E.

2026-06-09 health informatics 10.64898/2026.06.03.26354262 medRxiv

Top 2%

1.4%

Show abstract

Machine learning (ML) has emerged as a transformative technology across biomedical and life science sectors, with applications spanning drug discovery, medical imaging, genomics, and clinical decision support (Goecks et al., 2020; Patel et al., 2020). Despite exponential growth in ML-related publications, from fewer than 100 articles in 2003 to nearly 25,000 by 2021 (NCBI, 2022), adoption among industry professionals remains uneven and sector-dependent. Understanding what drives or inhibits this adoption is critical for organisations seeking to leverage ML capabilities in research and clinical practice. Technology adoption in organisational contexts has been extensively studied through the Technology Acceptance Model (TAM), originally proposed by Davis (1989) and subsequently extended to incorporate external variables influencing perceived usefulness (PU) and perceived ease of use (PEU) (Venkatesh & Davis, 1996). While TAM has been applied across multiple industries, its application within biomedical and life science contexts remains limited, and the industry-specific factors that shape ML acceptance in this sector have not been systematically examined. Two external variables are particularly relevant to life science professionals. First, the bibliometric journal impact factor (JIF) functions as a cognitive signal of scientific credibility, a sector where evidence-based decision-making is culturally embedded, and publication quality serves as a proxy for technological legitimacy (Garfield, 1996). Second, technology hype, operationalised through the Gartner Hype Cycle framework, represents a social influence variable that shapes organisational expectations and investment decisions around emerging technologies (Gartner Inc., 2018). Whether these variables influence ML acceptance among life science professionals, alongside individual knowledge and experience, has not been empirically tested. This study addresses that gap by investigating ML technology acceptance among 213 biomedical and life science professionals across EMEA, LATAM, and North America, using a cross-sectional quantitative survey and PLS-SEM analysis. The TAM model is extended with three external variables, JIF, technology hype, and prior knowledge and experience, to test their influence on PU and PEU in this specific professional context. Additionally, the study examines demographic and regional differences in ML acceptance, with particular attention to variation between academic researchers and healthcare professionals. The findings contribute a validated, sector-specific extension of TAM for life sciences, provide actionable insights for organisations seeking to accelerate ML implementation, and establish a framework for future subsector-specific research.

18

Bias from small-count suppression in county-level cancer disparity estimates: a calibrated simulation study

gahan, k.

2026-06-08 epidemiology 10.64898/2026.06.05.26355021 medRxiv

Top 2%

1.3%

Show abstract

Abstract Background. Area-level cancer disparities are routinely estimated from public county data in which rates based on small counts (fewer than 16 cases or deaths) are suppressed. Analysts typically drop suppressed counties (complete-case analysis). Because suppression depends on case counts tied to population size and demographic composition, this missingness may be informative, but its effect on the disparity estimate has not, to our knowledge, been quantified. Methods. In a cross-sectional ecological study of 3,143 U.S. counties (analytic sample 3,018 with computable exposure) using one frozen public release of NCI State Cancer Profiles incidence and mortality data and ACS 2018-2022 5-year data, we estimated the most- versus least-deprived ICE(race+income) quintile rate ratio (RR) and rate difference for female breast, stomach, and cervix cancers under four suppression-handling methods: complete-case, available-case, bounding, and model-based small-area estimation. We characterized which counties were erased, and, following the ADEMP framework, ran a Monte Carlo simulation (1,000 replicates per cell; Monte Carlo standard error of bias approximately 0.0025) calibrated to the release to measure bias against a known truth. Analyses were pre-registered. Results. The suppressed fraction rose with rarity: 7.4% of counties for breast, 61.3% for stomach, and 75.7% for cervix incidence. Suppression was concentrated in the most-deprived quintile (cervix, 81.8% suppressed vs 63.8% least-deprived) and overwhelmingly removed rural rather than minority residents (cervix: 81% of the rural but 9% of the minority population erased). For breast (little suppression) the RR was 0.87 (95% CI 0.85-0.89) and identical across methods; for cervix incidence the complete-case RR (1.56) exceeded the model-based estimate (1.50), and for cervix mortality (91% suppressed) complete-case (1.86) exceeded model-based (1.56) by 16% with a wide bounding interval (1.88-2.62). In calibrated simulation, population-weighted complete-case bias was small (less than 2%) at the observed deprivation-county-size correlation and grew with rarity, threshold, and unweighted aggregation; its direction was conditional, becoming positive (over-estimation) as deprived counties became smaller. Conclusions. Complete-case handling of suppressed counties over-estimates rare-cancer area disparities relative to methods that retain them, while silently erasing most of the rural and most-deprived communities the estimate is meant to represent. The effect is negligible for common cancers and grows with rarity. Public-data disparity analyses should report the suppressed fraction and use bounded or model-based estimates by default. Keywords: cancer disparities; small-count suppression; Index of Concentration at the Extremes; informative missingness; small-area estimation; rural health.

19

A Decade of the Center for Disease Control and Prevention's FluSight Influenza Forecasting

Hines, A. G.; Mathis, S. M.; Johansson, M. A.; Biggerstaff, M.; Reed, C.; Borchering, R.

2026-06-08 epidemiology 10.64898/2026.06.05.26354941 medRxiv

Top 2%

1.3%

Show abstract

Since the U.S. 2013/14 influenza season, the CDC's FluSight Challenge has provided a platform for evaluating influenza forecasting models and fostering collaboration across institutions. The Challenge aims to improve the science and enhance the utility of infectious disease forecasts for public health decision making. We analyzed ten years of submitted forecasts (2014/15-2019/20 (influenza-like illness seasons) and 2021/22-2024/25 (hospital admissions seasons)) across a range of model types, including statistical, mechanistic, machine learning, and hybrid models. Influenza-like illness (ILI) forecasts were evaluated using the exponentiated logarithmic score (skill metric) while hospital admissions forecasts were evaluated using the log transformed relative Weighted Interval Score. Corresponding potential performance differences were assessed using Wilcoxon rank-sum tests, and associations with team participation history were evaluated using Spearman's rank correlation. Model performance varied by season, and no single model type consistently outperformed others. In ILI seasons, statistical models generally performed better than mechanistic and machine learning models, though consistent differences were not observed in more recent hospital admissions seasons. Ensemble forecasts showed better overall performance across seasons, and the CDC's FluSight ensemble ranked among the top-performing forecasts every year. We also found a positive correlation between forecast accuracy and the number of years a team participated in the Challenge, with statistically significant associations in four seasons. These findings highlight the benefits of ensemble approaches and sustained engagement in improving forecasting performance, while also underscoring the continued value of forecast evaluation before and following the COVID-19 pandemic. Insights from the FluSight Challenge can guide future infectious disease forecasting efforts and support more effective public health preparedness.

20

A hierarchical clinical fusion transformer model for personalized opioid treatment: Development and validation in diabetic surgical patients

Naderalvojoud, B.; Sutjiadi, B. J.; Koul, A.; Curtin, C.; Gevaert, O.; Hernandez-Boussard, T.

2026-06-08 health informatics 10.64898/2026.06.04.26353331 medRxiv

Top 2%

1.3%

Show abstract

Background Machine learning (ML) models are increasingly used to predict adverse outcomes after surgery. However, most rely on static patient characteristics (e.g., age, comorbidities) and overlook clinician-controlled treatment decisions that can be actively modified at the point of care. Discharge opioid prescribing is a key modifiable, clinician-controlled decision, yet optimizing prescribing choices across multiple adverse outcomes remains underexplored in predictive modeling. This study addresses that gap by introducing a novel ML framework that explicitly separates fixed patient risk factors from modifiable prescribing options to support personalized, risk-informed opioid prescribing decisions. Methods We developed the Hierarchical Clinical Fusion Transformer (HCF-Transformer), an ML model designed to estimate patient-specific risks across four postoperative outcomes: prolonged opioid use (POU), chronic pain (CP), 30-day readmission, and opioid-associated outcomes (OAO). The model constructs patient risk profiles from fixed, non-modifiable baseline factors, followed by a transformer layer. Clinician-controllable discharge opioid regimens are modeled as alternative intervention candidates and fused with the fixed risk representation through a clinical fusion mechanism, enabling assessment and ranking based on predicted risks. A Total Relative Risk (TRR) metric, calibrated to each outcome prediction threshold, guides the recommendation process. We evaluated the model in diabetic surgical patients, a common high-risk population. Results The study included 157,853 unique diabetic surgical patients, with outcome prevalences ranging from 47.2% (POU) to 1.8% (OAO). The HCF-Transformer achieved the highest AUROCs, 0.798 for POU, 0.712 for 30-day readmission, 0.808 for CP, and 0.922 for OAO, outperforming Random Forest, FT-Transformer, and ResNet-based models. Compared to these baselines, HCF-Transformer generated more stable and discriminative risk estimates and demonstrated significant variation in TRR scores across discharge opioid options (ANOVA p < .01, eta-squared > .01). This enabled consistent identification of lower-risk regimens tailored to patient-specific profiles. Conclusions The HCF-Transformer introduces a novel hierarchical fusion approach to optimize opioid prescribing by integrating static patient risk profiles with modifiable discharge options. Using transformer-based modeling and a quantifiable TRR metric, the model delivers personalized, risk-aware recommendations. This approach enables data-driven opioid prescribing tailored to individual risk and has the potential to improve postoperative outcomes in high-risk populations. Our findings demonstrate that integrating modifiable factors with structured risk profiles through a transformer-based fusion architecture can enhance decision-support systems, paving the way for more actionable and personalized AI in healthcare.